Using Longest Common Subsequence Matching for Chinese Information Retrieval
نویسندگان
چکیده
This paper is about adopting the longest common subsequence (LCS) matching for Chinese information retrieval. We re-ranked the retrieved documents by a mixture of the original similarity score and the LCS score obtained by matching the document titles and the query. This LCS-based similarity score is also used in pseudo-relevance feedback in various ways (e.g., selecting terms and filtering documents with low LCS values). We evaluated the use of LCS in title re-ranking and PRF based on the NTCIR-4 test collection for Chinese ad hoc information retrieval. For title queries, our best MAP achieved is 26.7% evaluated using rigid relevance judgement and 30.2% evaluated using relax relevance judgement.
منابع مشابه
Speeding up transposition-invariant string matching
Finding the longest common subsequence (LCS) of two given sequences A = a0a1 . . . am−1 and B = b0b1 . . . bn−1 is an important and well studied problem. We consider its generalization, transposition-invariant LCS (LCTS), which has recently arisen in the field of music information retrieval. In LCTS, we look for the longest common subsequence between the sequences A + t = (a0 + t)(a1 + t) . . ....
متن کاملQuery Terms Extraction from Patent Document for Invalidity Search
This paper describes our patent retrieval system participated in the NTCIR-5 Patent Retrieval Task, Document Retrieval Subtask. The main scope of our method is the appropriate query expansion to improve recall. We extracted query terms from the topic claim, and expanded query terms extracted from sentences explained in the patent document including the topic claim. The explanation sentences wer...
متن کاملTime-Warped Longest Common Subsequence Algorithm for Music Retrieval
Recent advances in music information retrieval have enabled users to query a database by singing or humming into a microphone. The queries are often inaccurate versions of the original songs due to singing errors and errors introduced in the music transcription process. In this paper, we present the Time-Warped Longest Common Subsequence algorithm (T-WLCS), which deals with singing errors invol...
متن کاملApplication of Natural Language Processing Tools in Stemming
In the present work an innovative attempt is being made to develop a novel conflation method that exploits the phonetic quality of words and uses some standard NLP tools like LD (Levenshtein Distance) and LCS (Longest Common Subsequence) for Stemming process. General Terms Information Retrieval (IR), Stemming.
متن کاملFast and Cache-Oblivious Dynamic Programming with Local Dependencies
String comparison such as sequence alignment, edit distance computation, longest common subsequence computation, and approximate string matching is a key task (and often computational bottleneck) in large-scale textual information retrieval. For instance, algorithms for sequence alignment are widely used in bioinformatics to compare DNA and protein sequences. These problems can all be solved us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Chinese Language and Computing
دوره 15 شماره
صفحات -
تاریخ انتشار 2005